# Librarieslibrary(tidyverse)library(gmRi)library(scales)library(patchwork)library(tidyquant)library(rnaturalearth)library(sf)library(matrixStats)conflicted::conflict_prefer("select", "dplyr")conflicted::conflict_prefer("filter", "dplyr")# ggplot themetheme_set(theme_gmri(axis.line.y =element_line(),axis.ticks.y =element_line(), rect =element_rect(fill ="white", color =NA),panel.grid.major.y =element_blank(),strip.text.y =element_text(angle =0),axis.text.x =element_text(size =8),axis.text.y =element_text(size =8),legend.position ="bottom"))# labels for factor levelsarea_levels <-c("GoM", "GB", "SNE", "MAB")area_levels_long <-c("Gulf of Maine", "Georges Bank", "Southern New England", "Mid-Atlantic Bight")area_levels_all <-c("Northeast Shelf", area_levels)area_levels_long_all <-c("Northeast Shelf", area_levels_long)# table to join for swapping shorthand for long-hand namesarea_df <-data.frame(area = area_levels_long_all,survey_area = area_levels_all,area_titles = area_levels_long_all)# Degree symboldeg_c <-"\u00b0C"# Data used for Wigley estimateswigley_in <-read_csv(here::here("Data/processed/wigley_species_trawl_data.csv")) %>%left_join(area_df) %>%mutate(survey_area =factor(survey_area, levels = area_levels_all),area =factor(area, levels = area_levels_long_all),season =factor(season, levels =c("Spring", "Fall")))# Data for Mid-Atlantic Bightmab <-filter(wigley_in, survey_area=="MAB")
Median Size
This is the median individual weight, arrived at using matrixStats::weightedMedian using one of two different vectors for body-size (2): 1. length_cm the length in cm
2. ind_weight_g the estimated weight in grams (Wigley length-weight estimate).
The function also takes a second argument for weights: numlen which is the number of individuals at that size.
Code
med_size <- wigley_in %>%#filter(ind_weight_g > 12) %>% group_by(area, est_year, season) %>%summarise(median_length =weightedMedian(x = length_cm, w = numlen, na.rm = T),median_weight =weightedMedian(x = ind_weight_g, w = numlen, na.rm = T),.groups ="drop") len_plot <- med_size %>%ggplot(aes(est_year, median_length, color = season)) +geom_line(linewidth =1, alpha =0.8) +facet_grid(area~., scales ="free") +scale_color_gmri() +labs(title ="Median Length", y ="Length (cm)", x ="Year")wt_plot <- med_size %>%ggplot(aes(est_year, median_weight, color = season)) +geom_line(linewidth =1) +facet_grid(area~., scales ="free") +scale_color_gmri() +labs(title ="Median Weight", y ="Weight (g)", x ="Year")len_plot | wt_plot
Size Spectra
There’s been way too many iterations of this. The absolute values definitely change under minor tuning of input data AND fit parameters. The general vibe of the long-term changes are similar undwer these tunings, but they do change as well:
Using numlen_adj
Until this last week (!) I had been using numlen_adj in an effort to counteract the survey change in 2008.
I have reflected on this, and I no longer think its helpful to do that for this application. Potentially creating more confusion than resolving.
Mid-Atlantic Community Size Distribution Diagnostics
The MAB region has been a major source of doubt and questions about whether we are measuring “a community” in a meaningful way that is informative when viewed on long time scales.
For this exploration, I am going to limit the data to the Mid-Atlantic Bight.
I won’t be adjusting any abundances for gear changes, because we’re interested in the distribution of size, and upon reflection
I think we are creating more headache than we are solving with the numlen_adj which adjusts total tow abundance by species, but preserves the size distribution within the species.
The correction factors that the numlen_adj adjustments aim to match are only applied to some species, so we are modifying their relative abundances to other species which do not have their own correction factors.
Let’s Get Our Bearings
There is a wealth of information in the trawl dataset, however it is also a very large region and I think its important to understand what level of spatio-temporal coverage we have in terms of samples/area in each season.
Typical Effort
Trawl sampling occurs two times a year, in the spring and fall. Sampling effort is fairly consistent in effort and duration, and occurs at two distinct periods on the typical seasonal variation of temperature.
Code
# How Many stations do we typically have in a season:stations <- mab %>%distinct(year = est_year, season, area, station, tow, decdeg_beglon, decdeg_beglat, est_towdate)station_effort_p <- stations %>%group_by(year, season, area) %>%summarise(n_stations =n()) %>%ggplot(aes(x = year, y = n_stations, color = season)) +geom_line(show.legend = F, linewidth =1) +scale_y_continuous(limits =c(0, 70)) +scale_color_gmri() +facet_wrap(~area, scales ="free", ncol =1) +labs(x ="Year", y ="Survey Stations", title ="How Much Do We Sample?")# How Much Time Does it Take to Cover that Effort# How Many stations do we typically have in a season:sample_periods <- mab %>%distinct(year = est_year, season, area, station, tow, decdeg_beglon, decdeg_beglat, est_towdate) %>%group_by(year, area, season) %>%summarise(seas_start =min(est_towdate),seas_end =max(est_towdate),seas_length =max(est_towdate) -min(est_towdate),.groups ="drop") %>%mutate(seas_start = (as.Date("2000-01-01") +yday(seas_start)-1), seas_end = (as.Date("2000-01-01") +yday(seas_end)-1))# When do they Start/Endeffort_when_p <- sample_periods %>%ggplot() +geom_segment(aes(x = year, xend = year, y = seas_start, yend = seas_end, color = season),show.legend = F, linewidth =1) +scale_color_gmri() +facet_wrap(~area, scales ="free", ncol =1) +scale_y_date(date_breaks ="1 month", date_labels ="%b",limits =as.Date(c("2000-01-01", "2000-12-31"))) +labs(x ="Year", y ="Calendar Date", title ="When Are We Sampling?" )# How Long is the seasonseason_length_p <- sample_periods %>%ggplot(aes(x = year, y = seas_length, color = season)) +geom_line(linewidth =1) +scale_color_gmri() +scale_y_continuous(limits =c(0, 30)) +facet_wrap(~area, scales ="free", ncol =1) +labs(x ="Year", y ="Elapsed Time (days)", title ="How Long Does it Take?")# What are the average seasonal bookends:mab_seas_bookends <- sample_periods %>%group_by(season) %>%summarise(avg_start =mean(seas_start),avg_end =mean(seas_end)) # What is happening during that time of yearmab_glorys <-read_csv(str_c(cs_path("res", "GLORYS/Trawl_EPU_surfbot_timeseries"),"GLORYs_surfbottemp_mid_atlantic_bight_strata.csv"))# Plot the temperature cyclesst_p <- mab_glorys %>%mutate(doy =yday(time),flat_date =as.Date("2000-01-01") + doy-1) %>%pivot_longer(cols =ends_with("temp"),names_to ="var", values_to ="temperature") %>%ggplot() +geom_rect(data = mab_seas_bookends,aes(xmin = avg_start, xmax = avg_end, ymin =-Inf, ymax =Inf, fill = season), alpha =0.4) +geom_line(aes(flat_date, temperature, group =year(time)), alpha =0.25, color ="gray50", linewidth =0.8) +facet_wrap(~var, ncol =1) +scale_fill_gmri() +scale_x_date(date_breaks ="1 month", date_labels ="%b") +labs(x ="Calendar Date", y ="SST", title ="Whats Happening\nOceanographically")(station_effort_p /season_length_p ) | effort_when_p / sst_p
Code
# How far apart are they on average:# How much area are we trying to speak to:# Shapesnew_england <-ne_states("united states of america", returnclass ="sf")mab_area <-read_sf(here::here("Data/raw", "nmfs_trawl_regions_collection.geojson")) %>%filter(area =="Mid-Atlantic Bight") %>%st_as_sf()# Pull data for one yearmap_yr <-2000map_yr_df <- stations %>% sf::st_as_sf(coords =c("decdeg_beglon", "decdeg_beglat"), crs =4326, remove = F) %>%mutate(yr_seas =str_c(year, season, sep ="XX")) %>%filter(year == map_yr) %>%arrange(est_towdate)map_yr_bookends <- map_yr_df %>%split(.$season) %>%map_dfr(~.x %>%filter(est_towdate==max(est_towdate) | est_towdate ==min(est_towdate)))# Make a map of that yearggplot() +geom_sf(data = new_england) +geom_sf(data = mab_area, aes(fill = area), alpha =0.2, color ="gray80") +geom_sf(data = map_yr_df,#data = st_buffer(map_yr_df, dist = 32000),aes(color = season), size =0.5) +geom_path(data = map_yr_df,aes(x = decdeg_beglon, decdeg_beglat, group = season),linewidth =0.3, linetype =3) +geom_label(data = map_yr_bookends,aes(decdeg_beglon, decdeg_beglat, label =str_sub(est_towdate, 7,10))) +scale_color_gmri() +facet_wrap(~season) +coord_sf(xlim =c(-77, -72.8), ylim =c(35.5, 40)) +theme(panel.grid.major =element_blank()) +labs(title =str_c("Example Sample Coverage: ", map_yr),fill ="Area",color ="Survey Seeason")
Effort In Numbers:
The area swept of a “standard tow” of the albatross gear was found in the noaa-edab::survdat repository to be 0.0384\(km^2\).
Code
# How much do we typically tow a seasonavg_seas_effort <- stations %>%group_by(year, season, area) %>%summarise(effort =sum(n())) %>%pull(effort) %>%mean()# How long is it usuallyseas_avg <- sample_periods %>%pull(seas_length) %>%mean()# How much area do we survey?# Area of the regionmab_area_m2 <-st_area(st_union(mab_area)) #/ 1000000 # manual unit changemab_area_km2 <-as.numeric(measurements::conv_unit(mab_area_m2, "m2", "km2"))# area of one towsingle_tow_area <-0.0384# fractionarea_pct <- ((single_tow_area * avg_seas_effort) / mab_area_km2) *100
In a typical season in the MAB:
NOAA samples 51 tows on average, taking place over a 15 day period.
Over each season the trawl sampling will cover 0.0045795% of the total bottom surface of the Mid-Atlantic Bight as a region.
Station locations are selected at random locations, with effort allocation stratified by depth strata.
Code
# Typical Distancetest_sf <- map_yr_df %>%filter(season =="Spring")test_dist <- map_yr_df %>%filter(season =="Spring") %>%st_distance()test_nearest <- test_sf %>%st_nearest_feature()# Get those distances, pitatest_sf$nearest_station <-0for (i in1:nrow(test_sf)) { test_sf$nearest_station[i] <- test_dist[i, test_nearest[i] ]}test_sf$nearest_station %>%mean()
[1] 19232.67
Code
# Do them allclosest_stations <- stations %>% sf::st_as_sf(coords =c("decdeg_beglon", "decdeg_beglat"), crs =4326, remove = F) %>%mutate(yr_seas =str_c(year, season, sep ="XX")) %>%split(.$yr_seas) %>%map_dfr(function(season_subset){ dist_mat <-st_distance(season_subset) nearest_idx <-st_nearest_feature(season_subset)# Get those distances, pita season_subset$nearest_station <-0for (i in1:nrow(season_subset)) { season_subset$nearest_station[i] <- dist_mat[i, nearest_idx[i] ] }return(season_subset) })# What is the typical distance?avg_dist <-mean(closest_stations$nearest_station, na.rm = T) /1000
The average distance between stations in the MAB in a given season is 17km apart.
Is This Dataset Sufficient?
In most cases the trawl survey is the only fisheries independent dataset available, and for many use-cases I do believe that after so many years of standard operation we can make accurate inferences about things like species preferences and other life history parameters.
I do often wonder though, whether we are beyond the limit of what we can reliably* learn from this dataset, as we chase higher-resolution results from a dataset that was never intended to provide details at those resolutions.
Consideration 1: Habitat Heterogeneity
Its intuitive to think that species may have increased abundances near habitat features that provide some sort benefit to a species: protection from predators, forage opportunities, structure.
The seafloor is not uniform and we can resolve differential abundance patterns in space through the use of spatially explicit SDMs.
As an example, here is the long-term biomass spatial patterning for a species:
Code
#coca_path <-cs_path("mills", "Projects/COCA19_Projections/mod_fits")sd_vast <-read_rds(str_c(coca_path, "SpinyDogfish_full_fitted_vast.rds"))dens_dep_range <- sd_vast$Report$Range_raw1 # density dependentdens_ind_range <- sd_vast$Report$Range_raw2 # density independent
For Spiny dogfish, our VAST model used for the COCA reports has two range parameters that indicate the distance up-to-which data points are statistically dependent on one another to some degree.
For this species the density dependent (how many dogfish) range is 194km, and the density-independent presence-absence range is 68.
We also see that using data across all years we see persistant differences in densities in space.
The MAB median size plot has always bothered me. I don’t know why it spikes the way it does, and I don’t know what to say about the community size with it being so volatile.
There are 5 years where mid-atlantic bight size jumps from ~100g to 600-1200g.
These years are: 1983, 1989, 1995, 1996, 2007 and these outliers are all during the spring: